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Abstract 

Given a discrete- valued sample X\ , . . . , X n we wish 
to decide whether it was generated by a distribu- 
tion belonging to a family Hq, or it was generated 
by a distribution belonging to a family H\. In this 
work we assume that all distributions are stationary 
ergodic, and do not make any further assumptions 
(e.g. no independence or mixing rate assumptions). 
We would like to have a test whose probability of er- 
ror (both Type I and Type II) is uniformly bounded. 
More precisely, we require that for each e there ex- 
ist a sample size n such that probability of error is 
upper-bounded by e for samples longer than n. We 
find some necessary and some sufficient conditions on 
Hq and Hi under which a consistent test (with this 
notion of consistency) exists. These conditions are 
topological, with respect to the topology of distribu- 
tional distance. 



1 Introduction 

Given a sample X\ , . . . , X n (where Xi are from a 
finite alphabet A) which is known to be generated 
by a stationary ergodic process, we wish to decide 
whether it was generated by a distribution belonging 
to a family Hq, versus it was generated by a distri- 
bution belonging to a family Hi . Unlike most of the 
works on the subject, we do not assume that Xj are 
i.i.d., but only make a much weaker assumption that 
the distribution generating the sample is stationary 
ergodic. 

A test is a function that takes a sample and gives 
a binary (possibly incorrect) answer: the sample was 



generated by a distribution from Hq or from H i . An 
answer i £ {0,1} is correct if the sample is generated 
by a distribution that belongs to Hi. Here we are 
concerned with characterizing those pairs of Hq and 
Hi for which consistent tests exist. 

Consistency. In this work we consider the follow- 
ing notion of consistency. For two hypothesis Hq and 
Hi, a test is called uniformly consistent, if for any 
e > there is a sample size n such that the prob- 
ability of error on a sample of size larger than n is 
not greater than e if any distribution from Hq U Hi 
is chosen to generate the sample. Thus, a uniformly 
consistent test provides performance guarantees for 
finite sample sizes. 

The results. Here we obtain some topological con- 
ditions of the hypotheses for which consistent tests 
exist, for the case of stationary ergodic distributions. 

A distributional distance between two process dis- 
tributions is defined as a weighted sum of probabil- 
ities of all possible tuples X £ A* , where A is the 
alphabet and the weights are positive and have a fi- 
nite sum. 

The test <ph ,Hi that we construct is based on em- 
pirical estimates of distributional distance. It outputs 
if the given sample is closer to the (closure of) Hq 
than to the (closure of) Hi , and outputs 1 otherwise. 
The main result is as follows. 

Theorem. Let Hq, Hi C £, where £ is the set of all 
stationary ergodic process distributions. If, for each 
i £ {0, 1} the set Hi has probability 1 with respect to 
ergodic decompositions of every element of Hi, then 
there is a uniformly consistent test for Hq against 
Hi. Conversely, if there is a uniformly consistent 
test for Hq against Hi, then, for each i £ {0, 1}, the 
set Hi-i has probability with respect to ergodic 



decompositions of every element of Hi. 
Prior work. This work continuous our previous re- 
search |lli 112) , which provides similar necessary and 
sufficient conditions for the existence of a consistent 
test, for a weaker notion of asymmetric consistency: 
Type I error is uniformly bounded, while Type II er- 
ror is required to tend to as the sample size grows. 
Besides that, there is of course a vast body of liter- 
ature on hypothesis testing for i.i.d. (real- or discrete- 
valued) data (see e.g. [3 2]). There is, however, much 
less literature on hypothesis testing beyond i.i.d. or 
parametric models. For a weaker notion of consis- 
tency, namely, requiring that the test should sta- 
bilize on the correct answer for a.e. realization of 
the process (under either H or Hi), [5] constructs 
a consistent test for so-called constrained finite-state 
model classes (including finite-state Markov and hid- 
den Markov processes), against the general alterna- 
tive of stationary ergodic processes. For the same no- 
tion of consistency, [7] gives sufficient conditions on 
two hypotheses Hq and Hi that consist of stationary 
ergodic real-valued processes, under which a consis- 
tent test exists, extending the results of [2 j for i.i.d. 
data. The latter condition is that Hq and Hi are con- 
tained in disjoint F„ sets (countable unions of closed 
sets), with respect to the topology of weak conver- 
gence. Asymmetrically consistent tests for some spe- 
cific hypotheses, but under the general alternative of 
stationary ergodic processes, have been proposed in 
[5J HI EH H3 , which address problems of testing iden- 
tity, independence, estimating the order of a Markov 
process, and also the change point problem. Note- 
worthy, a conceptually simple hypothesis of homo- 
geneity (testing whether two sample are generated 
by the same or by different processes) does not ad- 
mit a consistent test even in the weakest asymptotic 
sense, as was shown in 110) . 



2 Preliminaries 

Let A be a finite alphabet, and denote A* the set of 
words (or tuples) U^A 4 . For a word B the symbol 
\B\ stands for the length of B. Denote Bi the ith 
element of A*, enumerated in such a way that the 
elements of A % appear before the elements of A l+1 , 



for all ieN. Distributions or (stochastic) processes 
are probability measures on the space (A°°, Ta°°), 
where Ta°° is the Borel sigma-algebra of A°° . Denote 
#(X, B) the number of occurrences of a word B in a 
word X ^ A* and v(X, B) its frequency: 

|X|-|B|+1 

#(X,B) = Y^ ^{(x i ,...,x i+ | B |_ 1 )=B}, 



and 
v(X,B) 



-i #(X,B) \£\X\>\B\ 



\X\-\B\ + 







otherwise, 



(1) 
where X = {Xi, . . . ,X\ X \)- For example, 

i/(0001, 00) = 2/3. 

We use the abbreviation Xl../. for Xi, . . . ,Xk- A 
process p is stationary if 



p(X UBl =B)= p(X t „ 



t+\B\ 



= B) 



for any B e A* and t € N. Denote S the set of all 
stationary processes on A°° . A stationary process p is 
called (stationary) ergodic if the frequency of occur- 
rence of each word B in a sequence Xi , X2 , . . . gener- 
ated by p tends to its a priori (or limiting) probability 
a.s.: p(lim n _ ) . 00 v{X x „ n , B) = p(X 1 ..\ B \ = B)) = 1. 
Denote E the set of all stationary ergodic processes. 
A distributional distance is defined for a pair of 
processes pi , pi as follows [3] : 



d{p\,P2) 



]m\pi(Xi..\ Bi \ = Bi)-p 2 {Xi,.\ Bi \ = Bi 



where Wi are summable positive real weights (e.g. 
Wk = 2~ k : we fix this choice for the sake of concrete- 
ness). It is easy to see that d is a metric. Equipped 
with this metric, the space of all stochastic processes 
is a compact, and the set of stationary processes S 
is its convex closed subset. (The set S is not closed.) 
When talking about closed and open subsets of S we 
assume the topology of d. Compactness of the set 
S is one of the main ingredients in the proofs of the 
main results. Another is that the distance d can be 
consistently estimated, as the following lemma shows 
(because of its importance for further development, 
we give it with a proof). 



Lemma 1 (d is consistent [131 E]). Let p, £ £ £ and 
let a sample Xx__ k be generated by p. Then 

lim d{Xi„ k ,£) =d(p,£) p-a.s. 



Proof. For any e > 
w 



find such an index 
J that YliLj w i < £ /2- For each j we 

have linife-j.oo v(Xi,,k, Bj) — p(Bj) a.s., so that 
\v(Xi..k>Bj) — p{Bj)\ < e/(2Jwj) from some k on; 
denote Kj this k. Let K — maxj<j Kj (K depends 
on the realization X\,Xz, . . .). Thus, for k > K we 
have 



\d(X l .. k ,0~d(p,0\ 

oo 

J2MW( X i-k,B t ) -£(Bi)\ - \p(Bi) -&Bi)\) 

i=l 

oo 

< Y, Wi\v(X u . k , Bi)-p(Bi)\ 

i=l 
J 

<Y,wMXi..k, B t ) - p x (B t )\ + e/2 

J 

< Y Wis/(2Jwi) + e/2 = s, 



which proves the statement. 



□ 



Considering the Borel (with respect to the met- 
ric d) sigma-algebra Ts on the set S, we obtain a 
standard probability space (S,J-s). An important 
tool that will be used in the analysis is ergodic de- 
composition of stationary processes (see e.g. [2l[T]): 
any stationary process can be expressed as a mixture 
of stationary ergodic processes. More formally, for 
any p £ S there is a measure W p on (S,^), such 
that W p {£) = 1, and p(B) = J dW p {p)p(B), for any 
B £ J-a°° ■ The support of a stationary distribution p 
is the minimal closed set U C S such that W P (U) = 1. 

A test is a function ip : A* — >■ {0, 1} that takes 
a sample and outputs a binary answer, where the 
answer i is interpreted as "the sample was generated 
by a distribution that belongs to Hi'. The answer 
i is correct if the sample was indeed generated by a 
distribution from Hi, otherwise we say that the test 
made an error. 



A test ip is called uniformly consistent if for ev- 
ery a there is an n a £ N such that for every n > n a 
the probability of error on a sample of size n is less 
than a: p(X £ A n : <p(X) = i) < a for every 
p £ Hi_i and every i £ {0, 1}. 

3 Main results 

The tests presented below are based on empirical es- 
timates of the distributional distance d: 

oo 

d(Xi.. n ,p) = y ^2 / w i \v{Xi.. ni B i ) - p(Bi)\, 
1=1 

where n £ N, p £ S, Xi.. n £ A n . That is, d(X\.. n ,p) 
measures the discrepancy between empirically esti- 
mated and theoretical probabilities. For a sample 
X\.. n £ A n and a hypothesis H £ £ dchne 

d\Xi.. n ,H)= Md(X x .. n ,p). 

For H £ S, denote cliJ the closure of H (with 
respect to the topology of d) . 

For Ho, Hi C S, the uniform test ipH ,Hi is con- 
structed as follows. For each n £ N let 

fH ,Hi{Xi.. n ) 

if d{X x .. n ,c\H Q f)£) <d(Xi..„,cl#in£), 

1 otherwise. 



(2) 



Theorem 1 (uniform testing). Let Hq £ S and 
Hi C <S. If Wp(Hi) — 1 for every p £ c\Hi then the 
test tpH .Hi is uniformly consistent. Conversely, if 
there exists a uniformly consistent test for Hq against 
Hi then W p {Hi^i) = for any p £ clHi. 

The proof is deferred to section [SJ 



4 Examples 

First of all, it is obvious that sets that consist of 
just one or finitely many stationary ergodic processes 
are closed and closed under ergodic decompositions; 



therefore, for any pair of disjoint sets of this type, 
there exists a uniformly consistent test. (In particu- 
lar, there is a uniformly consistent test for Hq = {po} 
against Hi — {p{\, where po, p\ G £.) 

It is clear that for any po there is no uniformly 
consistent test for {p Q } against £\{po}. More gener- 
ally, for any non-empty Hq there is no uniformly con- 
sistent test for H against £ \H provided the latter 
complement is also non-empty Indeed, this follows 
from Theorem Q] since in these cases the closures of 
Ho and Hi are not disjoint. One might suggest at 
this point that a uniformly consistent test exists if 
we restrict Hi to those processes that are sufficiently 
far from p . However, this is not true. We can prove 
an even stronger negative result. 

Proposition 1. Let p, v G £, p ^ v and let e > 0. 
There is no uniformly consistent test for Hq = {p} 
against Hi = \v' G £ : d(y' , v) < s}. 

The proof of the proposition is deferred to the 
appendix. What it means is that, while distribu- 
tional distance is well suited for characterizing those 
hypotheses for which consistent test exist, it is not 
suited for formulating the actual hypotheses. Appar- 
ently a stronger distance is needed for the latter. 

The following statement is easy to demonstrate 
from Theorem [T] 

Corollary 1. Given two disjoint sets Hq and Hi 
each of which is continuously parametrized by a com- 
pact set of parameters and is closed under taking er- 
godic decompositions, there exists a uniformly consis- 
tent test of Hq against Hi . 

Examples of parametrisations mentioned in the 
Corollary are the sets of /c-order Markov sources, 
parametrised by transition probabilities. Thus, any 
two disjoint closed subsets of these sets satisfy the 
assumption of the Corollary. 



5 Proofs 



Lemma 2 (smooth probabilities of deviation). Let 
m > 2k > 1, p £ S, H C S, and e > 0. Then 



p(d(Xi.. m , H) > s) 



<p[d{Xi.. kl H)>e- 



2k 



m — k + 1 



tk , (3) 



where t k is the sum of all the weights of tuples longer 
than k in the definition of d: t k :— J2i-\B\>n w i> an ^ 

p(d(Xi.. m ,H)<e) 
<p(d(Xi.. k ,H)< ^ £ ' 2k 



m — fc + 1 m — k + 1 



(4) 



The meaning of this lemma is as follows. For any 
word X\.. m , if it is far away from (or close to) a 
given distribution p (in the empirical distributional 
distance), then some of its shorter subwords Xi^ +k is 
far from (close to) p too. By stationarity, we may as- 
sume that i = 1. Therefore, the probability of a <5-ball 
of samples of a given length is close to the probability 
of a <5-ball of samples of smaller size. In other words, 
it cannot happen that a small sample is likely to be 
close to p, but a larger sample is likely to be far. 

Lemma 3. Let p k G S, k G N be a sequence of 
processes that converges to a process p* . Then, for 
any T G A* and e > if Pk{T) > e for infinitely 
many indices k, then p*(T) > e. 

This statement follows from the fact that p{T) is 
continuous as a function of p. 

Proof of Theorem [7J To prove the first statement 
of the theorem, we will show that the test <ph ,Hi is a 
uniformly consistent test for c\HqD£ against c\HiC\£ 
(and hence for Hq against Hi), under the conditions 
of the theorem. Suppose that, on the contrary, for 
some a > for every n' G N there is a process p G 
cl-ffo such that p(ip(Xi,, n ) = 1) > a for some n > n' . 
Define 



inf d(p ,pi) 

PoeciH n£,pieciH 1 n£ 



The proof of Theorem Q] will use the following lem- A : ~ d(clH ,c\Hi) 
mas, whose proofs can be found in [llj . 

which is positive since cl Hq and cl Hi are closed and 



disjoint. We have 

a < p(<p(Xi„ n ) = 1) 

< p(d(JCi.. n ,ff ) > A/2 or d{Xi.. n ,Hi) < A/2) 
< p(d(JC 1 „ n> fr ) > A/2)+p(d(X 1 .. n ,ff 1 ) < A/2). 



(5) 



This implies that either p(d(Xi,, n ,clH ) > A/2) > 
a/2 or p(d(Xi.. n ,clJ?i) < A/2) > a/2, so that, by 
assumption, at least one of these inequalities holds 
for infinitely many n £ N for some sequence p n £ 
Hq. Suppose that it is the first one, that is, there 
is an increasing sequence Uj, i £ N and a sequence 
Pi £ cl Hq , i £ N such that 

pi(d(X 1 .. nt ,clHo) > A/2) > a/2 for all i £ N. (6) 

The set S is compact, hence so is its closed subset 
c\H . Therefore, the sequence p i: i £ N must con- 
tain a subsequence that converges to a certain process 
p* £ c\H . Passing to a subsequence if necessary, we 
may assume that this convergent subsequence is the 
sequence pi, i £ N itself. 

Using Lemma [2] ([3]) (with p = p„ m , m = n m , 
k = rifc, and H = c\H ), and taking k large enough 
to have t nk < A/4, for every m large enough to have 
— 2rik _ L1 < A/4, we obtain 

p„ ra (d(X 1 .. nfc ,clF )>A/4) 

> p„ m (d(X 1 ..„ m , cl H ) > A/2) > a/2. (7) 

That is, we have shown that for any large enough in- 
dex nk the inequality pn m (d(Xi.. nh , cl Hq) > A/4) > 
a/2 holds for infinitely many indices n m . From 
this and Lemma |3] with T = Tj- := {X : 
d(Xi__ nk ,c\H ) > A/4} we conclude that p*(T fc ) > 
a/2. The latter holds for infinitely many k; that 
is, p*(d(Xi.. nk ,c\Ho) > A/4) > a/2 infinitely often. 
Therefore, 

p„(limsupd(Xi..„,cliJ ) > A/4) > 0. 
However, we must have 



p.(lim d(X 1 .. n ,clF ) = 0) = l 

n— >oo 



for every p* £ c\H : indeed, for p* £ cIHq fl £ it 
follows from Lemma [l] and for p* £ cl Hq \£ from 
Lemma [TJ ergodic decomposition and the conditions 
of the theorem. 

Thus, we have arrived at a contradiction that 
shows that p n (d(Xi.. n , cl Hq) > A/2) > a/2 can- 
not hold for infinitely many n £ N for any se- 
quence of p n £ cl H . Analogously, we can show that 
p n {d{Xx„ n ,c\Hi) < A/2) > a/2 cannot hold for in- 
finitely many n £ N for any sequence of p n £ cl Hq . 
Indeed, using Lemma [2l equation (j4]) , we can show 
that p rim (d(Xi,, nm ,c\H 1 ) < A/2) > a/2 for a large 
enough n m implies p„ m (d(Xi„ nk ,dHi) < 3A/4) > 
a/2 for a smaller n&. Therefore, if we assume that 
p n (d(Xi,. n ,clHi) < A/2) > a/2 for infinitely many 
n £ N for some sequence of p n £ cl Hq , then we will 
also find a p* for which p*(d{X\.. n , c\H\) < 3A/4) > 
a/2 for infinitely many n, which, using Lemma[T]and 
ergodic decomposition, can be shown to contradict 
the fact that /9*(Um„_ > . 00 d{X\.. n ,c\Hx) > A) = 1. 

Thus, returning to ([5]), we have shown that from 
some n on there is no p £ cl Hq for which p(<p = 
1) > a holds true. The statement for p £ c\H\ can 
be proven analogously, thereby finishing the proof of 
the first statement. 

To prove the second statement of the theorem, we 
assume that there exists a uniformly consistent test ip 
for Hq against Hi, and we will show that W p (Hi_i) = 
for every p £ c\H. Indeed, let p £ c\Hq, that is, 
suppose that there is a sequence & £ Hq , i £ N such 
that & — > p. Assume W p (Hi) = 5 > and take 
a := 5/2. Since the test (p is uniformly consistent, 
there is an ./V £ N such that for every n > N we have 



p(<p(Xi 



0)) 



< / ip(Xi.. n = 0)dW p + / <p(Xi.. n = 0)dW p 

<6a + l-5<l- 5/2. 

Recall that, for T £ A* , p(T) is a continuous func- 
tion in p. In particular, this holds for the set T = 
{X £ A n : ip(X) = 0}, for any given n £ N. There- 
fore, for every n > N and for every i large enough, 
pi((p(Xi.. n ) = 0) < 1-5/2 implies also £i(<p(Xi.. n ) = 
0) < I — 5/2 which contradicts ^ £ Hq. This contra- 



whose transition probabilities are 



diction shows W P {H\) = for every p G c\H . The 
case p G clHi is analogous. The theorem is proven. 

Proof of Proposition [7J Assume d(p, v) > e 
(the other case is obvious). Consider the process 
(xi,yi), {x2,V2)i- ■ • on pairs (xi,y t ) G A 2 , such that 
the distribution of Xi,X2, ■ ■ ■ is v, the distribution 
of 2/1, 2/2, • • • is jQ and the two components Xi and ?/j 
are independent; in other words, the distribution of 
(cCi,2/i) is v x p. Consider also a two-state station- 
ary ergodic Markov chain \l, with two states 1 and 2, 

1 -p p 
q 1-q 

where < p < q < 1. The limiting (and initial) 
probability of the state 1 is p/(p + q) and that of the 
state 2 is q/(jp + q). Finally, the process Z\, z 2 , . . . is 
constructed as follows: Zj = Xj if p is in the state a 
and Zi = yi otherwise (here it is assumed that the 
chain p generates a sequence of outcomes indepen- 
dently of (xi,yi). Clearly, for every p, q satisfying 
< p < q < 1 the process zi,Z2,--- is stationary er- 
godic; denote C its distribution. Let p n := l/(n + 1), 
n G N. Since d(p, v) > e, we can find a 5 > 
such that d(p, £ n ) > e where Cn is the distribu- 
tion C with parameters p n and g n , where q n satisfies 
q n /{Pn + q n ) = S. Thus, (n £ Hi for all n G N. How- 
ever, linin^oo C„ = Coo where Coo is the stationary 
distribution with W^ (p) = S and Wq x (u) = 1 — <5. 
Therefore, Coo G clifi and W^ooC^o) > 0, so that by 
Theorem [T] there is no uniformly consistent test for 
Hq against Hi, which concludes the proof. 
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